com.doclinx.ftxml
Class SRC2STF_PARMS

java.lang.Object
  |
  +--com.doclinx.ftxml.SRC2STF_PARMS

public final class SRC2STF_PARMS
extends java.lang.Object

Parameter block class that controls optional functionality during the text parsing phase. The TeraXML system can handle several data formats. XML is the primary format supported in the Java version. The SRC2STF_PARMS class provides a record structure that contains several options for controlling the parsing task.

See Also:
catSetParms, catAddFile.

Field Summary
static int ALT_DRI
          Default alternate DRI for hidden info.
 int dpapi_error
          Error code while building index
 java.lang.String ht_defFile
          deprecated
 int ht_docIdStart
          deprecated
 int ht_documentsProcessed
          # of documents processed
 boolean ht_inputIsListOfFiles
          deprecated
 boolean ht_stopOnFileError
          Stop if encountering parse error
 boolean ht_warnUnknown
          Warn about unknown GIDs
static int IS_ADD
          CatalogItem attrs field value.
static int IS_ARCH1ST
          CatalogItem attrs field value.
static int IS_ARCHN
          CatalogItem attrs field value.
static int IS_AUTOTYPE
          catAddFile method filter parameter type
static int IS_COMP1ST
          CatalogItem attrs field value.
static int IS_COMPN
          CatalogItem attrs field value.
static int IS_DELETED
          CatalogItem attrs field value.
static int IS_ENTITY_EXTRACTED
          CatalogItem attrs field value.
static int IS_FILTER
          catAddFile method filter parameter mask
static int IS_FILTER1
          Alternate name for IS_HTML
static int IS_FILTER2
          Alternate name for IS_XML
static int IS_FILTER3
          Alternate name for IS_GENERIC
static int IS_FILTER4
          Alternate name for IS_TEXT
static int IS_FILTER5
          Alternate name for IS_AUTOTYPE
static int IS_FILTERED
          CatalogItem attrs field value.
static int IS_GENERIC
          catAddFile method filter parameter type: C++ only
static int IS_HTML
          catAddFile method filter parameter type
static int IS_PRIMARY
          CatalogItem attrs field value.
static int IS_TEXT
          catAddFile method filter parameter type
static int IS_UPDATE
          CatalogItem attrs field value.
static int IS_XML
          catAddFile method filter parameter type
 java.lang.String sr_addedText
          Other added (non-indexed) text
 java.lang.String sr_altTitle
          Alternate title
static int SR_APPEND_DATES
          sr_flags bit setting. -- Append normalized dates to added text field
 boolean sr_appendToOutput
          Append to output (else overwrite)
 com.doclinx.jftr.CharProp sr_charProp
          Internal use only (not user parameter).
 byte sr_contextAaidx
          Context aaidx value(context attr)
 java.lang.String sr_dateFormats
          Allow users to specify date formats for parsing -- Uses Java SimpleDateFormat format strings (in quotes) delimited by ';' Note the 4 defaults are: "MM/dd/yyyy";"MMMM dd,yyyy"; "yyyyMMdd";"MM/dd/yyyy HH:mm:ss"; format: "fmt1";"fmt2";"fmt3"
static int SR_DDOC
          sr_flags bit setting. -- Disable Built-in Doc Filter
 boolean sr_debug
          Turn on parser debug info output
static int SR_DOC
          File type value.
static int SR_DOCCONTEXT
          sr_flags bit setting.
static int SR_DPDF
          sr_flags bit setting. -- Disable Built-in PDF Filter
 boolean sr_enableJapanese
          deprecated.
 java.lang.String sr_encoding
          Text encoding to use (no detect)
static int SR_EXCLUDE_XMLATTR
          sr_flags bit setting.
 java.lang.String sr_excludeList
          File exclude list: Exclude format: *.xml;foo.
 com.doclinx.ftxml.AppParms sr_f1
          Parameter callback information -- Additional application data for a document.
 com.doclinx.ftxml.InputCallback sr_f2
          Input callback function -- Open InputStream for readiing.
 int sr_filter
          Internal use only (not user parameter).
 int sr_flags
          SR_FLAGS (filter options), see flag values
 int sr_foldSettings
          Control bits for case folding
static int SR_GENERIC
          File type value.
static int SR_GENERICCONTEXT
          sr_flags bit setting.
static int SR_GENERICIDS
          sr_flags bit setting.
 java.lang.String sr_genericRoot
          Optional generic filter ROOT tag
static int SR_GENROOT_TYPE
          sr_flags bit setting.
 java.lang.String sr_globalParms
          Global user parameter data (use XML style tag).
 java.lang.String sr_gpConfig
          Internal use only (not user parameter).
 java.lang.String sr_gpDll
          Internal use only (not user parameter).
 com.doclinx.ftxml.GFilter sr_gpFilter
          Internal use only (not user parameter).
static int SR_HASCONTEXT
          sr_flags bit setting mask.
static int SR_HTML
          Return code indicating type of file found.
static int SR_HTML_HTM_CONTEXT
          sr_flags bit setting. -- html <html>,<title>, and <meta> context
static int SR_HTMLCONTEXT
          sr_flags bit setting.
static int SR_INCLUDE_CDDATA
          sr_includeWords parameter field values.
static int SR_INCLUDE_COLL_HDR
          sr_includeWords bit setting.
static int SR_INCLUDE_HTMLATTR
          sr_flags bit setting.
static int SR_INCLUDEDOCTYPE
          Permitted values for the sr_flag field.
 java.lang.String sr_includeList
          File include list: Include format: *.xml;foo.
 boolean sr_includePunctuation
          Place punct tokens in STF
 int sr_includeWords
          Control bits for word inclusion
 boolean sr_indexAltTitle
          true if indexing alt title
 boolean sr_indexModTime
          Index file modified time
 boolean sr_indexURL
          Index URL text (set in map file)
 java.lang.String sr_JDBCDoc
          Default JDBC Document wrapper
static int SR_KEEPXMLFROMPDF
          sr_flags bit setting.
 com.doclinx.jftr.Log sr_logFile
          Conversion information log file.
 java.lang.Object sr_map8
          Mapper for 8-bit encodings
 java.lang.String sr_mapDirectory
          Map directory for map files(.txt)
 int sr_maxWordChars
          Maximum length of a word(255 max)
 com.doclinx.ftxml.FileTime sr_modTime
          Internal use only (not user parameter).
static int SR_NOSPANSCRIPT
          sr_flags bit setting.
 java.lang.String sr_outputFile
          Internal use only (not user parameter).
static int SR_PARMCONTEXT
          sr_flags bit setting.
static int SR_PDF
          File type value.
static int SR_PDF_CONTENTORDER
          sr_flags bit setting. -- Enable raw order interpretation of PDF
static int SR_PDF_HILITE
          sr_flags bit setting. -- Enable built-in PDF filter to collect hilite info.
static int SR_PDF_PHYSORDER
          sr_flags bit setting. -- Enable physical order interpretation of PDF
static int SR_PDFCONTEXT
          sr_flags bit setting.
 java.lang.String sr_processFile
          Where to put catAddFile process file.
static int SR_PROMOTE_ALTTITLE
          sr_flags bit setting. -- Promote alternate title when title empty.
 java.lang.String sr_regExpression
          Word break regular exp (C++ only)
static int SR_REMOVE_FILEEXT
          sr_flags bit setting. -- Remove path from file names in catalog
static int SR_REMOVE_FILEPATH
          sr_flags bit setting. -- Remove path from file names in catalog
static int SR_SET_WORDBRK_EXT
          sr_flags bit setting. -- Set word break by file extent.
 java.lang.String sr_stfFile
          Where to put output token file
static int SR_TEXT
          File type value.
 java.lang.String sr_URL
          URL text
static int SR_USE_ALTDRI
          sr_flags bit setting. -- Place any added text into ALT_DRI
static int SR_USEDLLCALLBACK
          sr_flags bit setting.
 java.lang.String sr_vsdf
          Internal use only (not user parameter).
static int SR_XML
          File type value.
static int SR_XMLCONTEXT
          sr_flags bit setting.
static int SR_XMLSTRICT
          sr_flags bit setting.
 
Constructor Summary
SRC2STF_PARMS()
          Constructor with default values for parse control parameters.
 
Method Summary
 java.lang.String toString()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

SR_INCLUDEDOCTYPE

public static final int SR_INCLUDEDOCTYPE
Permitted values for the sr_flag field. The flag field control some parsing options, most notably context tree generation.
       SR_INCLUDEDOCTYPE   - Include document type integer in DRI 2 of the
                             parsed output (STF). For all parser types.
       SR_NOSPANSCRIPT     - Include words found in HTML javascript block.
       SR_XMLSTRICT        - Return parser error if an XML file does not begin
                             with .
       SR_GENROOT_TYPE     - Generate a "Root" name from the numeric type
                             of a document. This each file parameter set into
                             group organized by document type. NOTE: this 
                             option is currently only used in the C++ version.
       SR_HTMLCONTEXT      - Build context tree if HTML (not recommended when
                             HTML is not "well-formed").
       SR_XMLCONTEXT       - Build context tree for XML files.
       SR_GENRICCONTEXT    - Build context tree when using generic filter (all
                             other supported file types).
       SR_PARMCONTEXT      - Enable context for application passed parameters.
       SR_GENERICIDS       - Use sr_contextAaidx value as attribute for values
                             generated from generic tag values. Do not combine
                             with use of context trees. Note: this options is
                             currently only used in the C++ version.                        
       SR_USEDLLCALLBACK   - ** C++ version ONLY!
       SR_INCLUDE_HTMLATTR - Include HTML tag attributes (default is OFF) 
       SR_EXCLUDE_XMLATTR  - Exclude XML attributes (default in ON)
       SR_REMOVE_FILEPATH  - Remove path prefix from file name stored in catalog.
       SR_REMOVE_FILEEXT   - Remove file extent from file name stored in catalog.
       SR_DPDF             - Disable Built-in PDF Filter             
       SR_PDF_HILITE       - Enable built-in PDF filter to collect hilite info.
       SR_PDF_CONTENTORDER - Read PDF in content (raw) order - default: reading order
       SR_PDF_PHYSORDER    - Read PDF in physical order - default: reading order
       SR_DDOC             - Disable Built-in DOC Filter             
       SR_HTML_HTM_CONTEXT - Limit HTML context to <html>,<title>, and <meta>
       SR_SET_WORDBRK_EXT  - Get word break map file based upon file extent.
       SR_PROMOTE_ALTTITLE - Use alternate title for title (if empty)
    

See Also:
Constant Field Values

SR_NOSPANSCRIPT

public static final int SR_NOSPANSCRIPT
sr_flags bit setting.

See Also:
Constant Field Values

SR_XMLSTRICT

public static final int SR_XMLSTRICT
sr_flags bit setting.

See Also:
Constant Field Values

SR_GENROOT_TYPE

public static final int SR_GENROOT_TYPE
sr_flags bit setting.

See Also:
Constant Field Values

SR_HTMLCONTEXT

public static final int SR_HTMLCONTEXT
sr_flags bit setting.

See Also:
Constant Field Values

SR_XMLCONTEXT

public static final int SR_XMLCONTEXT
sr_flags bit setting.

See Also:
Constant Field Values

SR_GENERICCONTEXT

public static final int SR_GENERICCONTEXT
sr_flags bit setting.

See Also:
Constant Field Values

SR_PARMCONTEXT

public static final int SR_PARMCONTEXT
sr_flags bit setting.

See Also:
Constant Field Values

SR_PDFCONTEXT

public static final int SR_PDFCONTEXT
sr_flags bit setting.

See Also:
Constant Field Values

SR_GENERICIDS

public static final int SR_GENERICIDS
sr_flags bit setting.

See Also:
Constant Field Values

SR_USEDLLCALLBACK

public static final int SR_USEDLLCALLBACK
sr_flags bit setting.

See Also:
Constant Field Values

SR_INCLUDE_HTMLATTR

public static final int SR_INCLUDE_HTMLATTR
sr_flags bit setting.

See Also:
Constant Field Values

SR_EXCLUDE_XMLATTR

public static final int SR_EXCLUDE_XMLATTR
sr_flags bit setting.

See Also:
Constant Field Values

SR_USE_ALTDRI

public static final int SR_USE_ALTDRI
sr_flags bit setting. -- Place any added text into ALT_DRI

See Also:
Constant Field Values

SR_APPEND_DATES

public static final int SR_APPEND_DATES
sr_flags bit setting. -- Append normalized dates to added text field

See Also:
Constant Field Values

SR_REMOVE_FILEPATH

public static final int SR_REMOVE_FILEPATH
sr_flags bit setting. -- Remove path from file names in catalog

See Also:
Constant Field Values

SR_REMOVE_FILEEXT

public static final int SR_REMOVE_FILEEXT
sr_flags bit setting. -- Remove path from file names in catalog

See Also:
Constant Field Values

SR_DPDF

public static final int SR_DPDF
sr_flags bit setting. -- Disable Built-in PDF Filter

See Also:
Constant Field Values

SR_PDF_HILITE

public static final int SR_PDF_HILITE
sr_flags bit setting. -- Enable built-in PDF filter to collect hilite info.

See Also:
Constant Field Values

SR_PDF_CONTENTORDER

public static final int SR_PDF_CONTENTORDER
sr_flags bit setting. -- Enable raw order interpretation of PDF

See Also:
Constant Field Values

SR_PDF_PHYSORDER

public static final int SR_PDF_PHYSORDER
sr_flags bit setting. -- Enable physical order interpretation of PDF

See Also:
Constant Field Values

SR_DDOC

public static final int SR_DDOC
sr_flags bit setting. -- Disable Built-in Doc Filter

See Also:
Constant Field Values

SR_DOCCONTEXT

public static final int SR_DOCCONTEXT
sr_flags bit setting.

See Also:
Constant Field Values

SR_KEEPXMLFROMPDF

public static final int SR_KEEPXMLFROMPDF
sr_flags bit setting.

See Also:
Constant Field Values

SR_HTML_HTM_CONTEXT

public static final int SR_HTML_HTM_CONTEXT
sr_flags bit setting. -- html <html>,<title>, and <meta> context

See Also:
Constant Field Values

SR_SET_WORDBRK_EXT

public static final int SR_SET_WORDBRK_EXT
sr_flags bit setting. -- Set word break by file extent.

See Also:
Constant Field Values

SR_PROMOTE_ALTTITLE

public static final int SR_PROMOTE_ALTTITLE
sr_flags bit setting. -- Promote alternate title when title empty.

See Also:
Constant Field Values

SR_HASCONTEXT

public static final int SR_HASCONTEXT
sr_flags bit setting mask.

See Also:
Constant Field Values

SR_HTML

public static final int SR_HTML
Return code indicating type of file found. Included in ALT_DRI search region.

See Also:
Constant Field Values

SR_XML

public static final int SR_XML
File type value.

See Also:
Constant Field Values

SR_TEXT

public static final int SR_TEXT
File type value.

See Also:
Constant Field Values

SR_GENERIC

public static final int SR_GENERIC
File type value.

See Also:
Constant Field Values

SR_PDF

public static final int SR_PDF
File type value.

See Also:
Constant Field Values

SR_DOC

public static final int SR_DOC
File type value.

See Also:
Constant Field Values

ALT_DRI

public static int ALT_DRI
Default alternate DRI for hidden info.


IS_FILTER1

public static final int IS_FILTER1
Alternate name for IS_HTML

See Also:
Constant Field Values

IS_HTML

public static final int IS_HTML
catAddFile method filter parameter type

See Also:
Constant Field Values

IS_FILTER2

public static final int IS_FILTER2
Alternate name for IS_XML

See Also:
Constant Field Values

IS_XML

public static final int IS_XML
catAddFile method filter parameter type

See Also:
Constant Field Values

IS_FILTER3

public static final int IS_FILTER3
Alternate name for IS_GENERIC

See Also:
Constant Field Values

IS_GENERIC

public static final int IS_GENERIC
catAddFile method filter parameter type: C++ only

See Also:
Constant Field Values

IS_FILTER4

public static final int IS_FILTER4
Alternate name for IS_TEXT

See Also:
Constant Field Values

IS_TEXT

public static final int IS_TEXT
catAddFile method filter parameter type

See Also:
Constant Field Values

IS_FILTER5

public static final int IS_FILTER5
Alternate name for IS_AUTOTYPE

See Also:
Constant Field Values

IS_AUTOTYPE

public static final int IS_AUTOTYPE
catAddFile method filter parameter type

See Also:
Constant Field Values

IS_FILTER

public static final int IS_FILTER
catAddFile method filter parameter mask

See Also:
Constant Field Values

IS_DELETED

public static final int IS_DELETED
CatalogItem attrs field value.

See Also:
Constant Field Values

IS_ADD

public static final int IS_ADD
CatalogItem attrs field value.

See Also:
Constant Field Values

IS_PRIMARY

public static final int IS_PRIMARY
CatalogItem attrs field value.

See Also:
Constant Field Values

IS_UPDATE

public static final int IS_UPDATE
CatalogItem attrs field value.

See Also:
Constant Field Values

IS_FILTERED

public static final int IS_FILTERED
CatalogItem attrs field value.

See Also:
Constant Field Values

IS_ARCH1ST

public static final int IS_ARCH1ST
CatalogItem attrs field value.

See Also:
Constant Field Values

IS_ARCHN

public static final int IS_ARCHN
CatalogItem attrs field value.

See Also:
Constant Field Values

IS_COMP1ST

public static final int IS_COMP1ST
CatalogItem attrs field value.

See Also:
Constant Field Values

IS_COMPN

public static final int IS_COMPN
CatalogItem attrs field value.

See Also:
Constant Field Values

IS_ENTITY_EXTRACTED

public static final int IS_ENTITY_EXTRACTED
CatalogItem attrs field value.

See Also:
Constant Field Values

SR_INCLUDE_CDDATA

public static final int SR_INCLUDE_CDDATA
sr_includeWords parameter field values.
       SR_INCLUDE_CDDATA   - If bit set, include CDDATA words in parse.
       SR_INCLUDE_COLL_HDR - If bit set, and processing XML composite
                             file, include words from encapsulating
                             header tag(s) for each composite file.
    

See Also:
Constant Field Values

SR_INCLUDE_COLL_HDR

public static final int SR_INCLUDE_COLL_HDR
sr_includeWords bit setting.

See Also:
Constant Field Values

sr_stfFile

public java.lang.String sr_stfFile
Where to put output token file


sr_processFile

public java.lang.String sr_processFile
Where to put catAddFile process file.


sr_logFile

public com.doclinx.jftr.Log sr_logFile
Conversion information log file.


sr_flags

public int sr_flags
SR_FLAGS (filter options), see flag values


sr_contextAaidx

public byte sr_contextAaidx
Context aaidx value(context attr)


sr_genericRoot

public java.lang.String sr_genericRoot
Optional generic filter ROOT tag


sr_appendToOutput

public boolean sr_appendToOutput
Append to output (else overwrite)


sr_includePunctuation

public boolean sr_includePunctuation
Place punct tokens in STF


sr_debug

public boolean sr_debug
Turn on parser debug info output


sr_enableJapanese

public boolean sr_enableJapanese
deprecated.


sr_maxWordChars

public int sr_maxWordChars
Maximum length of a word(255 max)


sr_regExpression

public java.lang.String sr_regExpression
Word break regular exp (C++ only)


sr_foldSettings

public int sr_foldSettings
Control bits for case folding


sr_includeWords

public int sr_includeWords
Control bits for word inclusion


sr_encoding

public java.lang.String sr_encoding
Text encoding to use (no detect)


sr_altTitle

public java.lang.String sr_altTitle
Alternate title


sr_indexAltTitle

public boolean sr_indexAltTitle
true if indexing alt title


sr_addedText

public java.lang.String sr_addedText
Other added (non-indexed) text


sr_JDBCDoc

public java.lang.String sr_JDBCDoc
Default JDBC Document wrapper


sr_URL

public java.lang.String sr_URL
URL text


sr_indexURL

public boolean sr_indexURL
Index URL text (set in map file)


sr_indexModTime

public boolean sr_indexModTime
Index file modified time


sr_includeList

public java.lang.String sr_includeList
File include list: Include format: *.xml;foo.*;file.ext


sr_excludeList

public java.lang.String sr_excludeList
File exclude list: Exclude format: *.xml;foo.*;file.ext


sr_map8

public java.lang.Object sr_map8
Mapper for 8-bit encodings


sr_mapDirectory

public java.lang.String sr_mapDirectory
Map directory for map files(.txt)


sr_f1

public com.doclinx.ftxml.AppParms sr_f1
Parameter callback information -- Additional application data for a document. See AppParms class for more details.


sr_f2

public com.doclinx.ftxml.InputCallback sr_f2
Input callback function -- Open InputStream for readiing. See InputCallback class for more details.


sr_globalParms

public java.lang.String sr_globalParms
Global user parameter data (use XML style tag).
 format: <GTAG p1='w w w' p2='wx w w'>

 For above example, the search path would be "wx in xpath /GTAG/@p2".
 This data is included and repeated for EVERY document processed
 using catAddFile() method.

 User data can be set on a document-by-document basis using the 
 parameter callback function. See AppParms class for more 
 details on user data.
 


sr_dateFormats

public java.lang.String sr_dateFormats
Allow users to specify date formats for parsing -- Uses Java SimpleDateFormat format strings (in quotes) delimited by ';'
 Note the 4 defaults are: "MM/dd/yyyy";"MMMM dd,yyyy";
                          "yyyyMMdd";"MM/dd/yyyy HH:mm:ss";

 format: "fmt1";"fmt2";"fmt3"
 


sr_outputFile

public java.lang.String sr_outputFile
Internal use only (not user parameter).


sr_filter

public int sr_filter
Internal use only (not user parameter).


sr_gpDll

public java.lang.String sr_gpDll
Internal use only (not user parameter).


sr_gpFilter

public com.doclinx.ftxml.GFilter sr_gpFilter
Internal use only (not user parameter).


sr_gpConfig

public java.lang.String sr_gpConfig
Internal use only (not user parameter).


sr_vsdf

public java.lang.String sr_vsdf
Internal use only (not user parameter).


sr_modTime

public com.doclinx.ftxml.FileTime sr_modTime
Internal use only (not user parameter).


sr_charProp

public com.doclinx.jftr.CharProp sr_charProp
Internal use only (not user parameter).


ht_stopOnFileError

public boolean ht_stopOnFileError
Stop if encountering parse error


ht_warnUnknown

public boolean ht_warnUnknown
Warn about unknown GIDs


ht_inputIsListOfFiles

public boolean ht_inputIsListOfFiles
deprecated


ht_docIdStart

public int ht_docIdStart
deprecated


ht_defFile

public java.lang.String ht_defFile
deprecated


ht_documentsProcessed

public int ht_documentsProcessed
# of documents processed


dpapi_error

public int dpapi_error
Error code while building index

Constructor Detail

SRC2STF_PARMS

public SRC2STF_PARMS()
Constructor with default values for parse control parameters.

Method Detail

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object